{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Erdos-Renyi Graphs\n",
    "\n",
    "Code examples from [Think Complexity, 2nd edition](http://greenteapress.com/wp/complexity2), Chapter 2\n",
    "\n",
    "Copyright 2016 Allen Downey, [MIT License](http://opensource.org/licenses/MIT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function, division\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "import numpy as np\n",
    "\n",
    "# colors from our friends at http://colorbrewer2.org\n",
    "COLORS = ['#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462',\n",
    "          '#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from thinkstats2 import RandomSeed\n",
    "RandomSeed(17)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Directed graph\n",
    "\n",
    "The first example is a directed graph that represents a social network with three nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = nx.DiGraph()\n",
    "G.add_node('Alice')\n",
    "G.add_node('Bob')\n",
    "G.add_node('Chuck')\n",
    "G.nodes()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's how we add edges between nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G.add_edge('Alice', 'Bob')\n",
    "G.add_edge('Alice', 'Chuck')\n",
    "G.add_edge('Bob', 'Alice')\n",
    "G.add_edge('Bob', 'Chuck')\n",
    "G.edges()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's how to draw the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx.draw_circular(G, \n",
    "                 node_color=COLORS[0], \n",
    "                 node_size=2000, \n",
    "                 with_labels=True)\n",
    "plt.axis('equal')\n",
    "plt.savefig('chap02-1.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:**  Add another node and a few more edges and draw the graph again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Undirected graph\n",
    "\n",
    "The second example is an undirected graph that represents cities and the driving times between them.\n",
    "\n",
    "`pos` is a dictionary that maps from each city to its coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pos = dict(Albany=(-74, 43),\n",
    "          Boston=(-71, 42),\n",
    "          NYC=(-74, 41),\n",
    "          Philly=(-75, 40))\n",
    "pos['Albany']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the keys in `pos` to add nodes to the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = nx.Graph()\n",
    "G.add_nodes_from(pos)\n",
    "G.nodes()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`drive_times` is a dictionary that maps from pairs of cities to the driving times between them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "drive_times = {('Albany', 'Boston'): 3,\n",
    "               ('Albany', 'NYC'): 4,\n",
    "               ('Boston', 'NYC'): 4,\n",
    "               ('NYC', 'Philly'): 2}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the keys from `drive_times` to add edges to the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "G.add_edges_from(drive_times)\n",
    "G.edges()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can draw the graph using `pos` to indicate the positions of the nodes, and `drive_times` to label the edges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx.draw(G, pos, \n",
    "        node_color=COLORS[1], \n",
    "        node_shape='s', \n",
    "        node_size=2500, \n",
    "        with_labels=True)\n",
    "\n",
    "nx.draw_networkx_edge_labels(G, pos, \n",
    "                             edge_labels=drive_times)\n",
    "\n",
    "plt.axis('equal')\n",
    "plt.savefig('chap02-2.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:**  Add another city and at least one edge."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Complete graph\n",
    "\n",
    "To make a complete graph, we use a generator function that iterates through all pairs of nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def all_pairs(nodes):\n",
    "    for i, u in enumerate(nodes):\n",
    "        for j, v in enumerate(nodes):\n",
    "            if i < j:\n",
    "                yield u, v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`make_complete_graph` makes a `Graph` with the given number of nodes and edges between all pairs of nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_complete_graph(n):\n",
    "    G = nx.Graph()\n",
    "    nodes = range(n)\n",
    "    G.add_nodes_from(nodes)\n",
    "    G.add_edges_from(all_pairs(nodes))\n",
    "    return G"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a complete graph with 10 nodes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "complete = make_complete_graph(10)\n",
    "complete.number_of_nodes()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's what it looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx.draw_circular(complete, \n",
    "                 node_color=COLORS[2], \n",
    "                 node_size=1000, \n",
    "                 with_labels=True)\n",
    "plt.savefig('chap02-3.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `neighbors` method the neighbors for a given node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "list(complete.neighbors(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Make and draw complete directed graph with 5 nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random graphs\n",
    "\n",
    "Next we'll make a random graph where the probability of an edge between each pair of nodes is $p$.\n",
    "\n",
    "The helper function `flip` returns True with probability `p` and False with probability `1-p`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy.random import random\n",
    "\n",
    "def flip(p):\n",
    "    return random() < p"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`random_pairs` is a generator function that enumerates all possible pairs of nodes and yields each one with probability `p` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_pairs(nodes, p):\n",
    "    for i, u in enumerate(nodes):\n",
    "        for j, v in enumerate(nodes):\n",
    "            if i<j and flip(p):\n",
    "                yield u, v"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`make_random_graph` makes an ER graph where the probability of an edge between each pair of nodes is `p`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_random_graph(n, p):\n",
    "    G = nx.Graph()\n",
    "    nodes = range(n)\n",
    "    G.add_nodes_from(nodes)\n",
    "    G.add_edges_from(random_pairs(nodes, p))\n",
    "    return G"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example with `n=10` and `p=0.3`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "RandomSeed(10)\n",
    "\n",
    "random_graph = make_random_graph(10, 0.3)\n",
    "len(random_graph.edges())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's what it looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx.draw_circular(random_graph, \n",
    "                 node_color=COLORS[3], \n",
    "                 node_size=1000, \n",
    "                 with_labels=True)\n",
    "plt.savefig('chap02-4.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** The version of `random_pairs` in this section violates one of Python's design principles: Don't Repeat Yourself (DRY).\n",
    "\n",
    "Write a better version that uses `all_pairs` rather than repeating it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connectivity\n",
    "\n",
    "To check whether a graph is connected, we'll start by finding all nodes that can be reached, starting with a given node:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reachable_nodes(G, start):\n",
    "    seen = set()\n",
    "    stack = [start]\n",
    "    while stack:\n",
    "        node = stack.pop()\n",
    "        if node not in seen:\n",
    "            seen.add(node)\n",
    "            stack.extend(G.neighbors(node))\n",
    "    return seen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the complete graph, starting from node 0, we can reach all nodes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reachable_nodes(complete, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the random graph we generated, we can also reach all nodes (but that's not always true):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reachable_nodes(random_graph, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `reachable_nodes` to check whether a graph is connected:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def is_connected(G):\n",
    "    start = list(G)[0]\n",
    "    reachable = reachable_nodes(G, start)\n",
    "    return len(reachable) == len(G)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, the complete graph is connected:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "is_connected(complete)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But if we generate a random graph with a low value of `p`, it's not:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_graph = make_random_graph(10, 0.1)\n",
    "len(random_graph.edges())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "is_connected(random_graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** What do you think it means for a directed graph to be connected?  Write a function that checks whether a directed graph is connected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Probability of connectivity\n",
    "\n",
    "Now let's estimare the probability that a randomly-generated ER graph is connected.\n",
    "\n",
    "This function takes `n` and `p`, generates `iters` graphs, and returns the fraction of them that are connected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def prob_connected(n, p, iters=100):\n",
    "    count = 0\n",
    "    for i in range(iters):\n",
    "        random_graph = make_random_graph(n, p)\n",
    "        if is_connected(random_graph):\n",
    "            count += 1\n",
    "    return count/iters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `n=10` and `p=0.3`, the probability of being connected is about 65%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "RandomSeed(17)\n",
    "\n",
    "n = 10\n",
    "prob_connected(n, 0.3, iters=10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "According to Erdos and Renyi, the critical value of `p` for `n=10` is about 0.23. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pstar = np.log(n) / n\n",
    "pstar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So let's plot the probability of connectivity for a range of values for `p`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ps = np.logspace(-1.3, 0, 11)\n",
    "ps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll estimate the probabilities with `iters=1000`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ys = [prob_connected(n, p, 1000) for p in ps]\n",
    "\n",
    "for p, y in zip(ps, ys):\n",
    "    print(p, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then plot them, adding a vertical line at the computed critical value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import thinkplot\n",
    "\n",
    "thinkplot.vlines([pstar], 0, 1, color='gray')\n",
    "thinkplot.plot(ps, ys)\n",
    "thinkplot.config(xlabel='p', ylabel='prob connected',\n",
    "                 xscale='log', xlim=[ps[0], ps[-1]])\n",
    "plt.savefig('chap02-5.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can run the same analysis for a few more values of `n`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ns = [30, 100, 300]\n",
    "ps = np.logspace(-2.5, 0, 11)\n",
    "\n",
    "thinkplot.preplot(len(ns))\n",
    "for n in ns:\n",
    "    pstar = np.log(n) / n\n",
    "    thinkplot.vlines([pstar], 0, 1, color='gray')\n",
    "\n",
    "    ys = [prob_connected(n, p) for p in ps]\n",
    "    thinkplot.plot(ps, ys, label='n=%d' % n)\n",
    "\n",
    "thinkplot.config(xlabel='p', ylabel='prob connected',\n",
    "                 xscale='log', xlim=[ps[0], ps[-1]],\n",
    "                 loc='upper left')\n",
    "plt.savefig('chap02-6.pdf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As `n` increases, the critical value gets smaller and the transition gets more abrupt."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** In Chapter 2 we analyzed the performance of `reachable_nodes` and classified it in $O(n + m)$, where $n$ is the number of nodes and $m$ is the number of edges.  Continuing the\n",
    "analysis, what is the order of growth for `is_connected`?\n",
    "\n",
    "    def is_connected(G):\n",
    "        start = list(G)[0]\n",
    "        reachable = reachable_nodes(G, start)\n",
    "        return len(reachable) == len(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** In my implementation of `reachable_nodes`, you might be bothered by the apparent inefficiency of adding *all* neighbors to the stack without checking whether they are already in `seen`.  Write a version of this function that checks the neighbors before adding them to the stack.  Does this \"optimization\" change the order of growth?  Does it make the function faster?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reachable_nodes_precheck(G, start):\n",
    "    # FILL THIS IN\n",
    "    return []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit len(reachable_nodes(complete, 0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit len(reachable_nodes_precheck(complete, 0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "**Exercise:** There are actually two kinds of ER graphs.  The one we generated in the chapter, $G(n, p)$, is characterized by two parameters, the number of nodes and the probability of an edge between nodes.\n",
    "\n",
    "An alternative definition, denoted $G(n, m)$, is also characterized by two parameters: the number of nodes, $n$, and the number of edges, $m$.  Under this definition, the number of edges is fixed, but their location is random.\n",
    "\n",
    "Repeat the experiments we did in this chapter using this alternative definition.  Here are a few suggestions for how to proceed:\n",
    "\n",
    "1. Write a function called `m_pairs` that takes a list of nodes and the number of edges, $m$, and returns a random selection of $m$ edges.  A simple way to do that is to generate a list of all possible edges and use `random.sample`.\n",
    "\n",
    "2. Write a function called `make_m_graph` that takes $n$ and $m$ and returns a random graph with $n$ nodes and $m$ edges.\n",
    "\n",
    "3. Make a version of `prob_connected` that uses `make_m_graph` instead of `make_random_graph`.\n",
    "\n",
    "4. Compute the probability of connectivity for a range of values of $m$.\n",
    "\n",
    "How do the results of this experiment compare to the results using the first type of ER graph?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
